首页> 外文OA文献 >An MCMC algorithm for haplotype assembly from whole-genome sequence data
【2h】

An MCMC algorithm for haplotype assembly from whole-genome sequence data

机译:用于从全基因组序列数据中进行单倍型装配的MCMC算法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In comparison to genotypes, knowledge about haplotypes (the combination of alleles present on a single chromosome) is much more useful for whole-genome association studies and for making inferences about human evolutionary history. Haplotypes are typically inferred from population genotype data using computational methods. Whole-genome sequence data represent a promising resource for constructing haplotypes spanning hundreds of kilobases for an individual. In this article, we propose a Markov chain Monte Carlo (MCMC) algorithm, HASH (haplotype assembly for single human), for assembling haplotypes from sequenced DNA fragments that have been mapped to a reference genome assembly. The transitions of the Markov chain are generated using min-cut computations on graphs derived from the sequenced fragments. We have applied our method to infer haplotypes using whole-genome shotgun sequence data from a recently sequenced human individual. The high sequence coverage and presence of mate pairs result in fairly long haplotypes (N50 length ∼ 350 kb). Based on comparison of the sequenced fragments against the individual haplotypes, we demonstrate that the haplotypes for this individual inferred using HASH are significantly more accurate than the haplotypes estimated using a previously proposed greedy heuristic and a simple MCMC method. Using haplotypes from the HapMap project, we estimate the switch error rate of the haplotypes inferred using HASH to be quite low, ∼1.1%. Our Markov chain Monte Carlo algorithm represents a general framework for haplotype assembly that can be applied to sequence data generated by other sequencing technologies. The code implementing the methods and the phased individual haplotypes can be downloaded from http://www.cse.ucsd.edu/users/vibansal/HASH/.
机译:与基因型相比,关于单倍型(存在于单个染色体上的等位基因的组合)的知识对于全基因组关联研究以及对人类进化史的推断更有用。通常使用计算方法从群体基因型数据推断出单倍型。全基因组序列数据代表了构建个人跨越数百千碱基的单倍型的有前途的资源。在本文中,我们提出了一种马尔可夫链蒙特卡洛(MCMC)算法HASH(单人单倍型装配),用于从已映射到参考基因组装配的测序DNA片段中组装单倍型。马尔可夫链的跃迁是使用最小割计算在从序列片段中得出的图上生成的。我们已经使用我们的方法来使用来自最近测序的人类个体的全基因组shot弹枪序列数据来推断单倍型。高序列覆盖率和伴侣对的存在导致相当长的单倍型(N50长度〜350 kb)。基于已测序的片段与单个单体型的比较,我们证明了使用HASH推断的该个体的单体型比使用先前提出的贪婪启发式法和简单的MCMC方法估计的单体型明显更准确。使用来自HapMap项目的单倍型,我们估计使用HASH推断出的单倍型的开关错误率非常低,约为1.1%。我们的马尔可夫链蒙特卡罗算法代表了单体型装配的通用框架,该框架可应用于其他测序技术生成的序列数据。可以从http://www.cse.ucsd.edu/users/vibansal/HASH/下载实现该方法和分阶段的单个单元型的代码。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号